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ABSTRACT 

Music production involves a wide range of techniques that 
can be described by natural language. Capturing these de- 
scriptions at the source allows us to understand the inten- 
tions of an engineer and consequently develop intelligent 
tools and interfaces by computationally modelling them. In 
this paper we present a database architecture for capturing 
these attributes in a digital audio workstation, in which we 
retain audio features, audio-effect parameters, user infor- 
matics and semantic descriptions of the audio transforma- 
tions. This allows us to build a comprehensive map of the 
audio engineering workflow using linked-data, which can 
be utilised on the semantic web. We show that attributes 
such as provenance, which is omitted from relational database 
models can be a useful indicator of data validity. 

1. INTRODUCTION 

In music production, natural language is often used to de- 
scribe timbral transformations. Recently, these descriptions 
have been the focus of intelligent music production research 
jlj, as they allow for the development of systems that pro- 
vide intuitive control of trechnical processes. To facilitate 
this, we present a model for the representation of semanti- 
cally annotated music production data. 

2. THE SAFE ONTOLOGY 

The SAFE OntologjQis an extension of the Studio |2j and 
Audio Effects j3j] Ontologies, designed to represent the ap- 
plication of audio effects in music production and the se- 
mantic descriptions thereof. The data is gathered using the 
SAFE audio plug-ins and comprises: 

• Details of the processing applied to a signal (which 
audio effect used and its parameter settings). 

• A semantic description of the timbral effect of the 
processing. 

• Audio features of the signal before and after process- 
ing. 

• Metadata about the signal (instrument, genre). 

• Metadata about the processing (location). 

Available at http://www.semanticaudio.co.uk/ 
datasets/ safe-rdf 


• Metadata about the user (age, primary language, pro- 
duction experience). 

• Provenance of the above data (how the data was pro- 
duced e.g. human input or computer analysis). 

2.1. Transform Data 

Each entry in the SAFE dataset is described using the stu- 
dio:Transform concept. They all apply some transform to a 
set of input signals to produce a set of output signals. Each 
is given a semantic label, its safe.Descriptorltem , describ- 
ing the timbral effect it had on these signals. In the SAFE 
Ontology the transforms are described using the set of RDF 
triples shown in Figure [I] 



Figure 1 : The structure used to describe the application of 
an audio effect. 

Metadata items are used to provide details about the ap- 
plication domain of the effect. Each safe: Metadata Item de- 
scribes one property of an object, the property being identi- 
fied using an rdfstlabel and the description using an rdfstcomment. 
Each object described by metadata has its own set of proper- 
ties, “genre” and “instrument” metadata tags, for example, 
describe an audio signal ( motSignal ), while “location” tags 
describe a transform. 

2.2. Audio Feature Data 

The analysis of an audio signal is described using the 
safe:FeatureExtractionTransforin concept. This is similar 
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to the studio .Transform concept but uses an audio signal to 
generate a time series of feature values. Every signal used 
by a transform has its own set of feature extraction trans- 
forms which describe it. The temporal locations of each fea- 
ture value within a signal are described through use of the 
Timeline Ontology. Audio features are taken from both the 
input and output signals to aid in semantic analysis. Patterns 
found in the audio features suggest that a term describes the 
output signal of a transform, whereas patterns found in the 
change in audio features between input and output signals 
suggest a term describes the effects of the transform itself. 
This is highlighted in Figure [2] 



Figure 2: The structure used to describe the features of an 
audio signal. 

3. PROVENANCE DATA 

The SAFE Ontology makes extensive use of the Provenance 
Ontology to record the origins of the various data. The in- 
terface of the SAFE plug-ins requires that the user provide 
a semantic description of their use of the plug-in. Therefore 
the provenance of every safe. Descriptorltem is attributed to 
the user who saved it. It is, however, not mandatory for 
users to fill in the metadata fields. These can be later pop- 
ulated through analysis of the audio features of the signal. 
The provenance of the metadata items provides a method to 
distinguish between the more reliable user submitted meta- 
data and the less reliable computer generated metadata. 

Missing metadata is estimated using a collaborative fil- 
tering technique commonly associated with recommender 
systems |5j. The reliability of this computer generated meta- 
data is then compared against that provided by users. Firstly, 
redundant data is removed by applying principal component 
analysis to the audio feature data associated with each sig- 
nal. The first 10 components are retained, describing over 
97% of the total variance. The reliability of the metadata is 
measured as the mean within-class variance across the first 
10 principal components for each metadata tag. The results 
of this are shown in Table Q] 


Genre 

Instrument 

Tag 

var h 

var m 

Tag 

var h 

var m 

blues 

0.031 

0.035 

bass* 

0.024 

0.032 

classical* 

0.027 

0.031 

drums* 

0.008 

0.033 

electronica 

0.002 

0.004 

guitar 

0.026 

0.017 

experimental* 

0.011 

0.188 

hi-hat* 

0.000 

0.014 

funk 

0.026 

0.021 

kick* 

0.015 

0.023 

jazz* 

0.014 

0.020 

organ 

0.025 

0.026 

metal 

0.027 

0.014 

piano* 

0.031 

0.095 

pop 

0.015 

0.019 

snare 

0.028 

0.022 

reggae 

0.050 

0.040 

trumpet* 

0.045 

0.072 

rock 

0.145 

0.009 

vocals 

0.045 

0.011 

mean 

0.035 

0.038 

mean 

0.025 

0.035 


Table 1: The variance across the first 10 Principal Compo- 
nents for each of the tags in the genre and instrument cat- 
egories using human labelled ( varh ) and machine labelled 
( var m ) metadata entries, * represents a significant increase 
in variance (p < .005). 

The results show that the machine labelled entries exhibit .001 
higher variance than those labelled by humans for genre and .01 
higher variance for instrument classes. Whilst only a subset of 
the tags exhibit a statistically significant increase in variance, this 
suggests the provenance ontology plays an important role in de- 
scribing the reliability of both instrument and genre tags. 
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